An Inference-based Policy Gradient Method
ثبت نشده
چکیده
In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem: many approaches require access to a model of the environmental dynamics, and inferred options are often not interpretable, which limits our ability to explain the system behavior for verification or debugging purposes. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. Experimental results show that the options learned are interpretable. Further, we find that the method presented here is more sample efficient than existing methods, leading to faster and more stable learning of policies with options.
منابع مشابه
An Inference-based Policy Gradient Method
In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it...
متن کاملCombine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning
In this paper we discuss very preliminary work on how we can reduce the variance in black box variational inference based on a framework that combines Monte Carlo with exhaustive search. We also discuss how Monte Carlo and exhaustive search can be combined to deal with infinite dimensional discrete spaces. Our method builds upon and extends a recently proposed algorithm that constructs stochast...
متن کاملLearning Robot Skill Embeddings
We present a method for reinforcement learning of closely related skills that are parameterized via a skill embedding space. We learn such skills by taking advantage of latent variables and exploiting a connection between reinforcement learning and variational inference. The main contribution of our work is an entropyregularized policy gradient formulation for hierarchical policies, and an asso...
متن کاملStein Variational Policy Gradient
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian in...
متن کاملBayesian Policy Search for Multi-Agent Role Discovery
Bayesian inference is an appealing approach for leveraging prior knowledge in reinforcement learning (RL). In this paper we describe an algorithm for discovering different classes of roles for agents via Bayesian inference. In particular, we develop a Bayesian policy search approach for Multi-Agent RL (MARL), which is model-free and allows for priors on policy parameters. We present a novel opt...
متن کامل